面价/唤醒,表达和动作单元是面部情感分析中的相关任务。但是,由于各种收集的条件,这些任务仅在野外的性能有限。野外情感行为分析的第四次竞争(ABAW)提供了价值/唤醒,表达和动作单元标签的图像。在本文中,我们介绍了多任务学习框架,以增强野外三个相关任务的性能。功能共享和标签融合用于利用它们的关系。我们对提供的培训和验证数据进行实验。
translated by 谷歌翻译
从合成图像中学习由于标记真实图像的困难而在面部表达识别任务中起着重要作用,并且由于合成图像和真实图像之间存在差距而具有挑战性。第四次情感行为分析在野外竞争增加了挑战,并提供了Aff-Wild2数据集生成的合成图像。在本文中,我们提出了一种手工辅助表达识别方法,以减少合成数据和真实数据之间的差距。我们的方法由两个部分组成:表达识别模块和手部预测模块。表达识别模块提取表达信息,并预测模块预测图像是否包含手。决策模式用于结合两个模块的结果,并使用后延伸来改善结果。F1分数用于验证我们方法的有效性。
translated by 谷歌翻译
微观图像的清晰度在生物学研究和诊断中至关重要。当在细胞或分子水平处采取显微镜图像时,发生机械漂移并且可能是困难和膨胀的计数器。通过开发基于端基的深度学习的工作流程,可以克服这种问题,该工作流能够从聚焦超越的对应物中预测聚焦的显微图像。在我们的模型中,我们采用了多级U-Net的结构,每个级别连接头尾,彼此相应的卷积层。与传统的粗到精细模型相比,我们的模型使用从转移到更精细的网络的粗略网络蒸馏出来的知识。我们评估我们模型的性能,并发现我们的方法是有效的,并且通过将结果与现有模型进行比较,具有更好的性能。
translated by 谷歌翻译
The ubiquity of edge devices has led to a growing amount of unlabeled data produced at the edge. Deep learning models deployed on edge devices are required to learn from these unlabeled data to continuously improve accuracy. Self-supervised representation learning has achieved promising performances using centralized unlabeled data. However, the increasing awareness of privacy protection limits centralizing the distributed unlabeled image data on edge devices. While federated learning has been widely adopted to enable distributed machine learning with privacy preservation, without a data selection method to efficiently select streaming data, the traditional federated learning framework fails to handle these huge amounts of decentralized unlabeled data with limited storage resources on edge. To address these challenges, we propose a Federated on-device Contrastive learning framework with Coreset selection, which we call FedCoCo, to automatically select a coreset that consists of the most representative samples into the replay buffer on each device. It preserves data privacy as each client does not share raw data while learning good visual representations. Experiments demonstrate the effectiveness and significance of the proposed method in visual representation learning.
translated by 谷歌翻译
本文介绍了一种基于单模态语义分割的新型坑洞检测方法。它首先使用卷积神经网络从输入图像中提取视觉特征。然后,通道注意力模块重新引起通道功能以增强不同特征映射的一致性。随后,我们采用了一个不足的空间金字塔汇集模块(包括串联循环升级的不足卷积)来整合空间上下文信息。这有助于更好地区分坑洼和未损害的道路区域。最后,相邻层中的特征映射使用我们提出的多尺度特征融合模块融合。这进一步降低了不同特征通道层之间的语义间隙。在Pothole-600数据集上进行了广泛的实验,以证明我们提出的方法的有效性。定量比较表明,我们的方法在RGB图像和变换的差异图像上实现了最先进的(SOTA)性能,优于三个SOTA单模语义分段网络。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译